Real-time Top-K Predictive Query Processing over Event Streams
نویسندگان
چکیده
This paper addresses the problem of predicting the k events that are most likely to occur next, over historical real-time event streams. Existing approaches to causal prediction queries have a number of limitations. First, they exhaustively search over an acyclic causal network to find the most likely k effect events; however, data from real event streams frequently reflect cyclic causality. Second, they contain conservative assumptions intended to exclude all possible non-causal links in the causal network; it leads to the omission of many less-frequent but important causal links. We overcome these limitations by proposing a novel event precedence model and a runtime causal inference mechanism. The event precedence model constructs a first order absorbing Markov chain incrementally over event streams, where an edge between two events signifies a temporal precedence relationship between them, which is a necessary condition for causality. Then, the run-time causal inference mechanism learns causal relationships dynamically during query processing. This is done by removing some of the temporal precedence relationships that do not exhibit causality in the presence of other events in the event precedence model. This paper presents two query processing algorithms – one performs exhaustive search on the model and the other performs a more efficient reduced search with early termination. Experiments using two real datasets (cascading blackouts in power systems and web page views) verify the effectiveness of the probabilistic top-k prediction queries and the efficiency of the algorithms. Specifically, the reduced search algorithm reduced runtime, relative to exhaustive search, by 25− 80% (depending on the application) with only a small reduction in accuracy.
منابع مشابه
Knowledge-infused and Consistent Complex Event Processing over Real-time and Persistent Streams
Emerging applications in Internet of Things (IoT) and Cyber-Physical Systems (CPS) present novel challenges to Big Data platforms for performing online analytics. Ubiquitous sensors from IoT deployments are able to generate data streams at high velocity, that include information from a variety of domains, and accumulate to large volumes on disk. Complex Event Processing (CEP) is recognized as a...
متن کاملTowards Real-Time Data Stream Processing
Many applications require the continuous tracking of the state of a system in order to detect the occurrence of a particular event. RFID sensors, in particular, have become an increasingly popular means of gathering tracking information about the objects of interest. The need to query these data has spurred research at the intersection of sensor networks and databases. There are a number of cha...
متن کاملCPR : Complex Pattern Ranking for Evaluating Top - k Pattern Queries over Event Streams
Most existing approaches to complex event processing over streaming data rely on the assumption that the matches to the queries are rare and that the goal of the system is to identify these few matches within the incoming deluge of data. In many applications, such as stock market analysis and user credit card purchase pattern monitoring, however the matches to the user queries are in fact plent...
متن کاملQuery Driven Operator Placement for Complex Event Detection over Data Streams
We consider the problem of efficiently processing subscription queries over data streams in large-scale interconnected sensor networks. We propose a scalable algorithm for distributed data stream processing, applicable on top of any platform granting access to interconnected sensor networks. We make use of a probabilistic algorithm to check whether subscriptions are subsumed by other subscripti...
متن کاملTop-k Pattern Matching Using an Information-Theoretic Criterion over Probabilistic Data Streams
As the development of data mining technologies for sensor data streams, more sophisticated methods for complex event processing are demanded. In the case of event recognition, since event recognition results may contain errors, we need to deal with the uncertainty of events. We therefore consider probabilistic event data streams with occurrence probabilities of events, and develop a pattern mat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1508.06976 شماره
صفحات -
تاریخ انتشار 2014